Implementation and Performance of Cluster { Based FileReplication in Large - Scale Distributed
نویسندگان
چکیده
Large scale distributed le systems supporting 100s to 1000s of hosts are becoming increasingly important as distributed systems rapidly grow in scale. Our examination of the workload in one large commercial environment shows that widespread sharing of unstable les is also common in such large systems. This increase in scale and sharing places a heavy load on critical resources such as le servers and networks, leading to degradation in le access performance. Traditional client-based le caching techniques are inadequate in such environments. This paper describes the implementation and initial performance study of Frolic, a clusterbased dynamic le replication system. A cluster is a group of workstations and one or more le servers on a local area network. Instead of keeping copies of a widely shared le at each client workstation, these les are dynamically replicated onto the cluster le servers, so that they become locally available. We compare the performance of Frolic implemented on top of NFS with native NFS involving two clusters. File sizes and the number of consecutive accesses to a le from a particular cluster are varied in the sequential workload; workstation think times are varied in the concurrent workload. The results show that cluster-based le replication, as implemented in our Frolic prototype, can improve the performance of distributed le systems for a workload with wide-spread sharing of unstable les. 1
منابع مشابه
Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments
Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...
متن کاملCoordinated resource scheduling in a large scale virtual power plant considering demand response and energy storages
Virtual power plant (VPP) is an effective approach to aggregate distributed generation resources under a central control. This paper introduces a mixed-integer linear programming model for optimal scheduling of the internal resources of a large scale VPP in order to maximize its profit. The proposed model studies the effect of a demand response (DR) program on the scheduling of the VPP. The pro...
متن کاملContainer-based Cluster Management Platform for Distributed Computing
Several fields of science have traditionally demanded large-scale workflows support, which requires thousands of CPU cores or more. Since users’ demands for software packages and configuration is the difference, an approach to making available in real time a service environment desired by users without significant challenges for administrators is necessary. In this paper, we present a container...
متن کاملEntropy-based Consensus for Distributed Data Clustering
The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...
متن کاملLarge-eddy simulation of turbulent flow over an array of wall-mounted cubes submerged in an emulated atmospheric boundary-layer
Turbulent flow over an array of wall-mounted cubic obstacles has been numerically investigated using large-eddy simulation. The simulations have been performed using high-performance computations with local cluster systems. The array of cubes are fully submerged in a simulated deep rough-wall atmospheric boundary-layer with high turbulence intensity characteristics of environmental turbulent fl...
متن کامل